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A HIGHLY OPTIMIZED METHOD 
FOR MODELLING, LA. ANALYSIS AND SYNTHESIS, 
A WINDOWED SIGNAL WITH A VARIABLE LENGTH, 
AND THE USE OF THIS METHOD FOR PITCH ESTIMATION, 
SOURCE SEPARATION, AUDIO EFFECTS, AUDIO CODING, 
AUDIO ANNOTATION AND TRANSCRIPTION 



FIELD OF THE INVENTION 

The present invention relates to the modelling (analysis and synthesis) of musical signals 
and speech. The analysis method computes of amplitudes, phases and frequencies using a 
io non linear least squares estimation technique. The synthesis comprises the reconstruction of 
the signal from these parameters. The method allows a variable window length and is highly 
optimized, reducing the time complexity of the basic least squares method from 0(N 3 ) to 
Nlog(N). The present invention further relates to computer programs and devices therefore. 

BACKGROUND TO THE INVENTION 

The sinusoidal modelling of sound signals such as music and speech is a powerful tool for 
parameterizing sound sources. Once a sound has been parameterized, it can be synthesized 
for example, with a different pitch and duration. 

A sampled short time signal on which a window w n is applied may be represented by a 
model x n . consisting of a sum of sinusoids which are characterized by their frequency u k , 
jo phase <f> k and amplitude a k , 

K-l 



Jj— + <Pk) + r n (!) 



The offset value n 0 allows the origin of the timescale to be placed exactly in the middle of 
the window. The noise residual is denoted r n . For a signal with length N, n Q equals ^1. 
If the signal would be synthesized by a bank of oscillators, the complexity would be 
2s O(NK) with N being the number of samples and K the number of sinusoidal components. 



PC T/BE 0 3 / 0 0 2 0 7 



As described in patent WO 93/03478, the computational efficiency of the synthesis can 
be improved by using the inverse fourier transform. By using a window that has a small 
bandwidth in the frequency domain, the frequency response of each partial can be computed 
in constant time. A computation technique was disclosed in WO 93/03478 by using look-up 
s tables. 

Patent application WO 90/13887 discloses the estimation of the amplitudes by detecting 
individual peaks in the magnitude spectrum, and performing a parabolic interpolation to 
refine the frequency and amplitude values. In WO 93/04467 and WO 95/30983 a least 
means squares is presented for individual sinusoidal components which is applied iteratively, 
io subtracting a single sinusoidal component in each iteration. 

Presently, methods of the prior art achieve synthesis by inverse Fourier transformation 
using a fixed window length. However, such techniques are disadvantageous because "smear- 
ing" can occur, for example, when the length of the window is too large. This leads to an 
inaccurate synthesis of the sound, 
is Techniques of the prior art for analysing sound signals cannot resolve overlapping fre- 
quency responses of sinusoidal components with close frequencies. These overlapping peaks 
occur when multiple sound sources are mixed together or in monophonic signals with strong 
reverberation. Also when very small windows are used, all frequency responses can overlap 
even for a monophonic sound. 
20 Methods of the prior art use a basic least squares computation for analysing the signal. 
This requires a time complexity <D(K 2 N) and a space complexity 0(K 2 ). Due to the com- 
plexity of the calculation, methods of the prior art make heavy demands on the computing 
power. 

There is a need for a method for analysing and synthesising sound signals that overcomes 
25 these problems. 



2 
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SUMMARY OF THE INVENTION 

The present invention relates to the modelling (analysis and synthesis) of musical signals 
and speech and provides therefore a method for modelling, La. analyzing and/or synthesiz- 
ing, a windowed signal with a variable length, by computing the frequencies and complex 
s amplitudes from the signal by using an optimized least squares method, whereby the method 
is executed using one or more variable length windows and has a complexity <D(N log N). 

The numerous computational optimizations that are disclosed in this invention allow 
to reduce the original complexity of the non-linear least squares method which is 0(N S ) to 
<D(N log N). It is known that the great advantage of the least squares method is that it is able 
io to resolve overlapping peaks and is therefore more accurate than iterative methods. Today, 
iterative methods are widely used because of their smaller computational complexity. The 
present invention allows to speed up the superior least squares method to a computational 
complexity comparable with iterative methods. 

The invention improves several applications such as accurate pitch estimation, parametric 
is audio coding, source separation, audio effects and automated annotation and transcription 
The method computes the amplitudes, phases and frequencies using a non linear least 
squares estimation technique which is depicted in Figure 14. 

Two types of models are distinguished in the invention. The first model consists of a 
superposition of an arbitrary set of sinusoidal components. The second model consists of a 
20 set periodic sources that are described by a harmonic series of sines. All sines of a given 
source are a multiple of the fundamental frequency. 

For the harmonic model, a multipitch estimator provides a set of frequencies correspond- 
ing to the fundamental frequency of each source. For the non-harmonic model, the initial 
frequency values are obtained by detecting local maxima in the sampled spectrum. 
25 In a pre-processing step of the amplitude estimation, the frequencies are sorted and fre- 
quencies that occur multiple times are omitted. For each sinusoidal component the number 
of frequencies are counted that fall into its main lobe. The maximum of this number over 
all components yields that number of diagonal bands that are considered for the amplitude 
estimation. 

3o The amplitude calculator computes the amplitudes of the sinusoids using a least squares 
method. We show that by incorporating a window with a band limited frequency response 
in the least squares derivation, a band diagonal system of equations is obtained. Recall that 



3 
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the number of diagonal bands was determined during the pre-processing. This results in the 
fact that now the equations can be solved in linear time. The computation of the individual 
elements in the system is optimized using fast frequency response computation methods. 
The estimated amplitudes and frequencies are then used to compute the spectrum of the 
5 signal model. The difference with the observed spectrum yields the spectrum of the residual 
signal which is used for the optimization of the frequencies. 

The frequencies are optimized by making a local quadratic approximation of the error 
function and are iteratively adjusted using Newton's method. This method requires the 
computation of the gradient and Hessian matrix of the error function. For the computation 
l0 of the elements of the Hessian matrix and gradient, fast frequency response calculators are 
used. The band limited property of the window results in a band diagonal Hessian matrix 
which can be inverted in linear time. The optimized frequencies are used in the next iteration. 
This is repeated until a stopping criterium is met. 



4 
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SUMMARY OF THE INVENTION 

The present invention relates to the modelling (analysis and synthesis) of musical signals 
and speech and provides therefore a method for modelling, i.a. analyzing and/or synthesiz- 
ing, a windowed signal with a variable length, by computing the frequencies and complex 
amplitudes from the signal by using an optimized least squares method, whereby the method 
is executed using one or more variable length windows and has a complexity 0(N log TV). 

The numerous computational optimizations that are disclosed in this invention allow 
to reduce the original complexity of the non-linear least squares method which is 0(N 3 ) to 
0{N log N). It is known that the great advantage of the least squares method is that it is able 
to resolve overlapping peaks and is therefore more accurate than iterative methods. Today, 
iterative methods are widely used because of their smaller computational complexity. The 
present invention allows to speed up the superior least squares method to a computational 
complexity comparable with iterative methods. 

The invention improves several applications such as accurate pitch estimation, parametric 
audio coding, source separation, audio effects and automated annotation and transcription 
The method computes the amplitudes, phases and frequencies using a non linear least 
squares estimation technique which is depicted in Figure 14. 

Two types of models are distinguished in the invention. The first model consists of a 
superposition of an arbitrary set of sinusoidal components. The second model consists of a 
set periodic sources that are described by a harmonic series of sines. All sines of a given 
source are a multiple of the fundamental frequency. 

For the harmonic model, a multipitch estimator provides a set of frequencies correspond- 
ing to the fundamental frequency of each source. For the non-harmonic model, the initial 
frequency values are obtained by detecting local maxima in the sampled spectrum. 

In a pre-processing step of the amplitude estimation, the frequencies are sorted and fre- 
quencies that occur multiple times are omitted. For each sinusoidal component the number 
of frequencies are counted that fall into its main lobe. The maximum of this number over 
all components yields that number of diagonal bands that are considered for the amplitude 
estimation. 

The amplitude calculator computes the amplitudes of the sinusoids using a least squares 
method. We show that by incorporating a window with a band limited frequency response 
in the least squares derivation, a band diagonal system of equations is obtained. Recall that 
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the number of diagonal bands was determined during the pre-processing. This results in the 
fact that now the equations can be solved in linear time. The computation of the individual 
elements in the system is optimized using fast frequency response computation methods. 

The estimated amplitudes and frequencies are then used to compute the spectrum of the 
signal model. The difference with the observed spectrum yields the spectrum of the residual 
signal which is used for the optimization of the frequencies. 

The frequencies are optimized by making a local quadratic approximation of the error 
function and are iteratively adjusted using Newton's method. This method requires the 
computation of the gradient and Hessian matrix of the error function. For the computation 
of the elements of the Hessian matrix and gradient, fast frequency response calculators are 
used. The band limited property of the window results in a band diagonal Hessian matrix 
which can be inverted in linear time. The optimized frequencies are used in the next iteration. 
This is repeated until a stopping criterium is met. 
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BRIEF SUMMARY OF THE FIGURES 

Figure 1 depicts the frequency responses of the Blackmann- Harris window and the first 
and second derivative of frequency response. 

Figure 2: depicts the frequency responses of the zero padded Blackmann- Harris window, 
the frequency response of the squared window and its second derivative. 

Figure 3 depicts the theoretic motivation for the scaled look-up table. 

Figure 4 depicts the scaled table look-up. 

Figure 5 depicts the analytic frequency response computation. 

Figure 6 depicts the sampled frequency response interpolation. 

Figure 7 depicts the optimized synthesis method. 

Figure S depicts the optimized amplitude computation. 

Figure 9 depicts the pre-processing for the amplitude computation. 

Figure 10 depicts the calculation of the initial frequency values. 

Figure 11 depicts the frequency optimization for the non-harmonic model. 

Figure 12 depicts the frequency optimization for the harmonic model. 

Figure 13 depicts a subroutine of the frequency optimization for the harmonic model. 

Figure 14 depicts the complete Analysis/Synthesis method. 

Figure 15 depicts some applications for which the present invention will provide a con- 
siderable improvement. These applications are: 1) high accuracy pitch estimation, 2) audio 
coding, 3) audio effects and 4) source separation. 
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DETAILED DESCRIPTION OF THE INVENTION 

1 Synthesis 

1.1 Introduction 

The present invention provides an extension to inverse fourier transform synthesis by 
. allowing variable window lengths that not necessarily need to be a power of two. Since this 
is required for the inverse fast fourier transform, a window with length M is zero padded up 
to a length N = 2^)1. This results howeyer fa a ^ ^ 

frame. Three methods are disclosed for computing the frequency response of a window with 
a variable length. Any of these methods can be used to compute the frequency responses of 
» the windows that are applied throughout the invention. In the illustrations they are denoted 
as variable length freguency response calculators. 

1.2 Frequency Response of the Window 

Typically a window is chosen that has a frequency response with a limited bandwidth 
containing the main lobe. The response outside this band is very small so that it can be 
» neglected. This justifies the fact that only the main lobe of the frequency response must be 
computed in the synthesized spectrum X m . In particularly, but not exclusively, we cite the 
Blackmann-Harris window 

w n = « + &cos(27r^^) + CC os(47r^°) + rfcos(6 7 r^°) (2) 

with a = 0.35875, b = 0.48829, c = 0.14128 and d = 0.01168. The frequency response of 
20 the Blackmann-Harris window is shown in Fig. 1. Any other window with a bandlimited 
frequency response can be applied. 

1.3 Inverse FFT synthesis 

For mathematical simplicity, the time domain signal is described as a complex signal 

K-l 

s with 



A k = a k exp(i<j> k ) 
6 



(4) 
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of which only the real part of x n is retained as the synthesized signal. Evidently, the signal 
model x n has a reciprocal spectrum model X m . 

1 JV-1 

1 x — \ ~ , . n — no N . . 

Xn = 77 2^ Xm exp(27r^m ) (5) 

m=0 



where X m can be written as 



JV-1 



n—Q 



A fc exp(27T2^ fc — ) 



exp(— 2mra l jp^ ) 



K-i N-i n — n 

= -Afc ^ w n exp(-27n(m - a; fe ) n n ° ) 

#0 = Y^AkWim-oJk) (6) 

A;=0 

where W(m) denotes the discrete time fourier transform of w n . The spectrum model X m 
is a linear combination of frequency responses of the window, which are shifted over u>i and 
io weighted with a complex factor A\. The inverse FFT synthesis algorithm is depicted in Fig. 
7. 

1.4 Frequency Response Computation 

The use of the inverse fast fourier transform (IFFT), requires that the signal length is a 
power of two. When the window size is chosen to be fixed with a power of two length, an 
15 oversampled version of the frequency response can be computed on forehand and be used as 
0 a table look-up (see WO 93/03478). 

However, in this invention it is desired to use a variable window length M which is not 
necessarily a power of two. The length of the window can be made a power of. two again 
by adding zeros. Three methods are presented to compute the frequency response of a zero 
20 padded window being 

1. Scaled Table Look-up 

2. Analytic Frequency Response Computation 

3. Sampled Frequency Response Interpolation 

1.4.1 Scaled Table Look-up 



7 
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The fourier transform of a window with length N is denoted as W N (m), being 

W N (m - n 0 ) - f>"(n - n Q ) exp(-27n (n ~ n ° )(m = ■ (7) 

n=0 ■** 

with n 0 = ^1. The window ^(n) is now padded up to a length P which is denoted w%(n). 
Its frequency response Wg(m) can be expressed in terms of the non padded version W N {m) 
s using 

Wfom-po) = £ 1 ^(n-p 0 )exp(-2 7 ri^M!LZ^) ) 



P 

= ^V(n - n 0 ) exp(- 27 ri (m ~ n °)) 



n=0 
7V-1 



n=0 



p ~ n °) (8) 
where m ranges from 1 to P and p 0 = £=1. This shows that when the frequency response of 
io a window is zero padded, a scaled version of the original frequency response is obtained. 
When expressing w£(n - p 0 ) as the inverse fourier transform of Wfi(m-p 0 ) 

<(n ~ Po) = £ E - p 0 ) exp(2 7 rz (n -^ m -^ ) (9) 

m=0 ^ 

and truncating W£(m - p 0 ) to a length JV, we obtain - n 0 ). Its inverse fourier 

transform w^(n — n 0 ) can be written as 

<(n-no) = ^ E Wg(m - n,) «xp(2iri ( " ~ ~ *°) ) 

= ^ E (" - P») «p(2^n (W - "° )P - ) 

m=0 ^ ^ 

P P 

M Nr N , 



using 



We can now derive 



N__P 
M~N 



Kin-no) = ±W»(m - n 0 ) «xp(2*t ( " = "°^ m = ^ ) 
= Wg(m - Po) exp(2,ri (n ~ no ]^ m ~ *>> ) 
= j^W^m - no) exp(2*i ( " " ^ " ^ ) 
>^ - " no) ex P (2 7 ri (w -^-^ ) 
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what shows explicitly that a window with length M zero padded up to N can be computed 
from the inverse fourier transform of a scaled frequency response W N (m. - n Q ). This justifies 
the use of a scaled table look-up of the frequency response. This derivation illustrated by 
Fig. 3 and the scaled look-up method by Fig. 4. 
5 When a window with length M is zero padded up to a length N, the main lobe is enlarged 
up to 2^/3. Therefore, the synthesis of a frequency co k (see Eq. 6) requires the computation 
for all frequency domain samples m.for which 

rrimin < rn < m rnax 
m min = \uj k - ~(3j 

r N on 

" m max = \U k + JfiP] (10) 

From Fig. 1 one can observe that f3 = 4 is a good value for the Blackmann-Harris window. 
Fig. 2 shows the frequency response of a window which is doubled in length by zero padding. 
This frequency response is still bandlimited but its bandwidth is doubled. When comparing 
this with figure 1, it is clear that this is the same frequency response, scaled with a factor 2. 

is Conclusion: As a first method for the frequency response computation, the invention 
uses a look-up table containing an oversampled frequency response W N (m) of the main lobe 
and a scaling factor § yielding W N (m§). The method is illustrated in Fig. 4. This method 
is very fast but its accuracy is dependent on the size of the look-up table and might not be 
the best solution when strict memory constraints must be taken into account or when very 

20 • high accuracy is required. 

1.4.2 Analytic Frequency Response Computation 

In the case that memory constraints prohibit the use of a look-up table or when a very 
high accuracy is desired it is appropriate to compute the analytic form of the frequency 
response function. For every sample m, the response is still computed in constant time but 
25 this is clearly slower then the scaled table look-up method. The frequency response of a zero 



i 
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padded Blackmann-Harris window is now computed. 



M-1 



= ^2 w n exp(— 27rim — 



M-1 



= £ 



n— 0 
C 



2V exp(47ri 



. n — ro 0 ^ , _ , J . n — n 0 . 



-) + exp(— 47ri- 



M 



) + 



rf (exp(6 7 r^) + exp(-6^^) 



2 

M-1 



M 



exp ( — 2mm - ) 



= ]P ^aexp(-27r«(n-n 0 )^)+ 



n=0 



2 (« X P(2^-^(1 



- ( exp(27r*— ^(2 - 



AT 

Mm 

N 
Mm 



)) + exp(-27u-_— (l + — ))J + 



iV 
Mm, 



M 



ff) ) t ^,»^) + 



) 



Since n 0 = and using the equality 



N-l iV-1 
7 y 07 n 2 — x ~ \ 37 
n=0 n =0 



X 2 x 2W — 1 
X ~-2 X — 1 



N 
X 2 



_ jV 
X 2 



#2 — x 2 



with x = exp(27rzm/iV), it is obtained that 



N-l 



^ exp(27rzm 



n=0 



n — n 0 



sin(TT^) 



resulting" in 



(ii) 



(12) 



= a 

b 
2 



W»(m) 
sin(Tr^) 



+ 



sin(Trf) 

sin ( 7r (^ _ !)) sin(7r (Z2M + 1)} 



sin(7ri(^M _ i)) + sin(7r X(^£ + 
" sin(7T( ^ - 2)) , sin(7r(^M + 2)) 



2 [ S in(7ri(^ - 2)) + sinfri^ + 2)) 



+ 



+ 



d 
2 



an«2f -3)) , sin(7r(2BM + 3)) 



sin(7ri(^ _ 3 )) + sin(7r^(^M + 3)) 



(13) 
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When no zero padding is applied, meaning that M = N, this can be simplified further to 



= a 



W N (m) 
sin(7rm) 



+ 



b (" sin(7r(m — : 
2 Lsin(7r^(m - 
c [~ sin(7r(m — ! 
2 [sin(7r^(m — 



- 1)) , sin(7r(m + l)) 

r 



2 



1) ) B in(7ri(m + 1)) 
2)) sin(7r(m + 2)) 

2) ) sin(7ri(m + 2))_ 
sin(7r(m — 3)) j3in(7r(m + 3)) 



+ 



+ 



.sin(7ri(m - 3)) sin^m + 3)) J (14 ) 
Care must be taken that the denominator is not too close to zero. When this is the case, 
every exponential in the sum series yields 1, resulting in N for the sum. Hence, in a small 
interval — e < m < e, the expression 

sin(27rm) 

sin(27rm/Af) ( 15 ) 

returns N. 

In addition, the number of sine computations can be reduced by using the following 
trigonometric equalities 

sin (a + /?) = cos (a) sin(/?) + sin (a) cos(/3) 
15 cos (a; + p) = cos (a) cos(/?) - sin (a) sin(/3) 

For instance, expressing 

sin(7r(m — 2)) = sin(7r(m - 3) + tt) 
sm(n^(m - 2)) = sin(7r(m - 3) + ~) 

shows that all terms in equation (14) can be computed recursively from sin(7r(^ - 3)) and 
ao sin(£(^-3)). 
Again, since 

. . (m + 1)M . mM oX ttM n 

sin(7r ( ^^--3)) = sin ( 7r (_-3) + — ) 

. l.(m+l)M o ., . .mM ^ x ?r N 

Sm(?r M ( iV " 3)) = sm W— " 3) + iV ) 



25 the recursion is applied for computing the next sample m + 1 from values that were used for 
computing the sample m. 
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Table 1: Coefficients for different windows. 





a 


b 


c 


d 


Blackmann- Harris 


0.35875 


0.48829 


0.14128 


0.01168 


Blackmann 


0.42 


0.5 


0.08 


0 


Hamming 


0.54 


0.46 


0 


0 


Hanning 


0.5 


0.5 


0 


0 



Note that this solution is applicable to all windows that have a similar form as Eq. (2). 
They are listed in table 1. 

The computation of the analytic frequency response is illustrated in Fig. 5. 

1.4.3 Sampled Frequency Response Interpolation 

s Finally, when the table look-ups cannot be used and the analytic frequency response of the 
window cannot be computed, a third method can be used which is named Sampled Frequency 
Interpolation. Again, it computes the response for a frequency response in constant time but 
is less efficient that the previous methods. The non integer shift of W(m) over u) t is computed 
using a sampled spectrum W k obtained by an FFT of the zero padded window. Note that 
io also here, the division by zero must be avoided. This method is illustrated in Fig. 6. 



W(m) 

N-l 



= w n exp(— 2mm 



n — tiq 



n=0 
N-l 



N 



N-l 



) 



n — no 



n=0 
N-l N-l 



exp ( — 27T77i -~ — ) 

N J 



Wk X] exp(27rz(& - m) N 
sin(27r(& — m)) 



k=Q n=0 
N-l 



is 



fc=0 



sin(27r(A;-m)/iV) 



(16) 



2 Amplitude Computation 
2.1 Introduction 
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In this section, an efficient least mean squares technique is described for the computation 
of the amplitudes and phases. In WO 90/13887, the estimation of the amplitudes is claimed 
by detecting individual peaks in the magnitude spectrum, and performing a parabolic inter- 
polation to refine the frequency and amplitude values. In WO 93/04467 and WO 95/30983 
s a least means squares is presented which is applied iteratively on the signal, substracting a 
single sinusoidal component each time. 

The major difference with the present invention is that all amplitudes and phases are 
computed simultaneously for a given set of frequencies. This allows to resolve strongly 
overlapping frequency responses of sinusoidal components. As will be shown later, the orig- 
■ inal computational complexity of this method is G(K*N) where the K denotes the number 
of partials and N the number of partials. The invention however, solves this problem in 
0(Nlog(N)) and reduces the space complexity, which is originally 0(K 2 ) } to Q(K). The 
amplitude computation method is illustrated in Figure 8. 

2.2 Complex Amplitude Computation 

We describe how the amplitudes and phases are computed for all sinusoidal components 
We reformulate Eq. (1) as a sum of cosines and sines where the real part of the complex 
amplitude is denoted Aj = and the imaginary part as A\ = a ,sin^. The signal 

model for the short time signal x n can now be written as 



1 K ~ X / 

= w "2 ^ ( Ak ™V&™ k ~±) + At exp(-2niuj k ^p.)) 

k=Q x iV N J 

g [aI cos( 2 ™ fc ^) _ A{ sin(2 ™ A IlZ_^ (17) 



K-l 

VJn 



The error function X {A;Q) expresses the square difference between the samples i 



dowed signal x n and the signal model x n 



X (A;cD) = J2( x n-Zn) 2 



in the win- 



(18) 



The notation indicates that the error is minimized with respect to a vector of variables A for 
a given set of frequencies u> that are assumed to be known. The minimization is realized by 
putting the. derivatives with respect to the unknowns to zero 

dx(A;u>) d X (A;u>) 

dAj U '~dAf~-° . (19) 
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resulting respectively in 



K-l 



E^ ( E < cos(2vr^^=^) ooBCJJim;,^^) 



fc=0 



TV 



it / ^ 



fc=0 \n=0 



) C0S(27R^ — ) 



TV 

N-l 



^n^n cos(27ro; f — ) 



and 



K-l 



71=0 



^fc 2^^nCOS(27T^— -— )sm(27T^— — — ) 



(20) 



A;=0 
A'-l 



\n=0 



TV 



+ X>i ( E ™* sin(27T^^-^) sin(27ro;,I^) 



TV 



TV 

N-l 



TV 



= - ^2 x n w n sin(27ro;/ 



n—Q 



n — no 
TV 



) 



(21) 



These two sets of K equations have 2K unknown variables what can be written in the 
following matrix form 



(22) 



" B 1 ' 1 


B 1,2 




A r 




" C 1 " 


B 2 - 1 


B 2,2 




A' 




c 2 



with 



B th = Y] w l cos(27ro; fc n n ° ) cos(27ru;j 



n=0 



TV 



TV • ' 



B H = ~ ZX sin( 27 ra; fc ^p) cos(2™ r n " ^ 



n=0 
JV-1 



TV 



d2,1 



o2,2 



< OOs(27TWib— j^— ) sm(27R^ — ) 



71=0 

7V-1 



TV 



w n sm(27ra; fc - ) sm(27r^ — ) 



n=0 
JV-1 



TV 



C? = E x » w » cos (2^ — ° ) 



n=0 
JV-1 



a 2 



x n w n sin (2^ — ^r— ) 



71=0 



TV 



Under the condition that every sinusoid has a different frequency, the matrix B cannot have 
two linear dependent rows implying a unique solution for A. 
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The computational complexity of this method is very high, for instance, 

• the computation of the matrix B has a complexity Q(K 2 N) 

• the computation of the matrix C has a complexity O(KN) 

• the solution of the linear set of equations is 0(K 3 ) 

s Note that the order of magnitude of K and N is not significantly different, In the next 
sections, the complexity is reduced to 0(Nlog(N)). 

2.3 Efficient Computation of B 

Several optimizations for previous computation are disclosed. The main computational 
burden comes from the construction of the matrix B and the solution of the system of linear 
a equations which have complexity Q(K 2 N) and 0(K 3 ) respectively Following derivation 
shows that this can be improved considerably. We start by writing 

Bfi = f>^os(2™ fc Il^°) oob(W^) 

n=0 

= sX>* cos(2tt(^ + c*)^-_!!2) + cos(27rK - cut)u~*) 



2 

71=0 



= 5(»(y(«*+wi))+K(y(w fc -w»))) (23) 



with 



nm) = \ Xi^(exp(27rim^^)) 

n—Q 



(24) 



which is the time inverted fourier transform of the square window. In an analogue manner 
one obtains 



^8? = -kRpfa+ud-XiYfa-cj,))) (25) 



In figure 2 the frequency response of the squared Blackmann-Harris window is shown, which 
is clearly still bandlimited. This can be understood easily, considering that taking the square 
in the time domain is equivalent with a convolution in the frequency domain. 
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Since the window is real and symmetric, its frequency response is also real and symmetric. 
This means that B 1 - 2 and B 2 ' 1 only contain zeros. When defining the matrix Y~ />fc as 
Y(uj k — u>i) and the matrix Y + i tk as Y(uj k + oj{) one obtains 

B 1 - 1 = i(Y+ + Y-) (26) 
B 2 . 2 = -I(Y+-Y-) (27) 

Y(0) Y (w! - lo 0 ) Y (u 2 - ujq) Y(u 3 - to Q ) 

y _ = Y(u> 0 - ui X ) Y(0) y^a-wx) Y(w 2 -u; a ) 

y (w 0 - w 2 ) F(wi - cu 2 ) Y(0) Y (wa - w 2 ) 

Y (u>q - a> 3 ) Y(a^ - cu 3 ) Y(a; 2 - W3) Y(0) 

Y(2a; 0 ) Yfa + u> 0 ) Y{uj 2 + uj q ) Y(w 3 + w 0 ) 

Y + = Yiivo+uji) Y(2wi) y(cu!+cu 2 ) Y(wi+w 3 ) 

Y(w 0 + w 2 ) Y(cui + uj 2 ) ' Y (2tu 2 ) Y(u> 3 + u 2 ) 

Y(a; 0 + w 3 ) Y(a; 1 + a; 3 ) Y(a> 2 +u; 3 ) Y(2w 3 ) 

for a single harmonic sound source, this can be simplified further to 



Y- = 



Y(0) Y(u>) 


Y(2w) 


Y(3w) 


Y(-w) Y(0) 


Y(w) 


Y(2u) 


Y( 


-2w) Y(-w) 


Y(0) 




Y( 


-3w) Y(-2w) 


Y(-u/) 


Y(0) 




" Y(0) Y(w) 


Y(2w) 


Y(3w) 




Y(u) Y(2u) 


Y(3w) 


Y(4w) 




Y(2w) Y(3w) 


Y(4w) 


Y(5w) 




Y(3w) Y(4w) 


Y(8w) 


Y(6w) 



where a; is the fundamental frequency. 

The crucial observation that has to be made in this invention is that the matrix B becomes 
a band diagonal matrix when the sinusoidal components are sorted by their frequency. If the 
components are sorted, the frequency differences for elements close to the diagonal of Y~ is 
small and will fall in the main lobe of Y yielding a large value. For the elements far from the 
diagonal, the frequency difference is large and will fall outside the main lobe of Y. For Y + 
a similar reasoning can be applied. Elements close to the upper left corner have the smallest 
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frequency and can fall into the main lobe. The elements in the lower right corner have a 
very large value and can fall into the main lobe because of spectral replication. As a result, 
matrices B 1 * 1 and B 2 ' 2 are band diagonal which has a major impact on the computation 
time that is required to solve the equations. 

A typical method to solve a linear set of equations is the use of Gaussian elimination with 
backsubstitution. This method has a time complexity 0(K Z ). However, since the system is 
band diagonal, this method requires only a linear time complexity 0(K). 

In addition, the space complexity can be reduced from 0(K 2 ) to O(K) by storing only 
the diagonal band. Let us define a shifted matrix 

B i,k = Bi^k^D (28) 

where D denotes the number of diagonals that are stored around the main diagonal. Note 
that I = 0, . . . , L — 1 and k = 0, . . . , 2D. For combinations I) resulting in an index outside 
A, a zero value is returned. The amplitudes are computed directly from the shifted versions 
of B 1,1 , B 2 ' 2 . By denoting this routine as SOLVE this is written as 

A r = SOLVE^B^, C 1 ) 

A 1 = SOLVEiJ^, C 2 ) (29) 

Conclusions: 

• The original complexity for the computation of B was 0(K 2 N) 

• Using one of the constant time frequency response computation methods that were 
discussed in the synthesis section, this can be reduced to 0(K 2 ) 

• Since B is band diagonal, only elements that are close to the diagonal must be computed 
resulting in Q(K) 

• A second result of the band diagonal form of B is that the system can now be solved 
in O(K) instead of 0(K S ) 

• Finally, also the space complexity of B is reduced from (D(K 2 ) to O(K) 
This computation is illustrated in Figure 8. 

2.4 Efficient Computation of C 
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The results of the previous section moves the bottleneck to the computation of C which 
has a complexity O(KN). 

N-l 

Ci = y*] x n w n exp(2moJi U ^ ) 



n= 0 
iV-1 



= £ 



71=0 



N-l 

n 



^X m exp(27rim— ) 

_ro=0 

= J] X - W/r ( m + ^) • (30) 

5 771=0 

By taking the real and imaginary part respectively, C 1 and C 2 are obtained. Analogue to 
Eq. (10), only samples of the main lobe must be computed, yielding 

C t = XmWim + ut) (31) 

m^TTljnin 

Again, the frequency response W{m) can be computed by previously described methods. 
io The fourier transform of x Tl however requires a time complexity O(NlogN), 
This computation is illustrated in Figure 8. 

2.5 Amplitude Computation Pre-processing 

The goal of the pre-processing before the amplitude computation is twofold. On one 
hand the frequencies are sorted in order to obtain a band diagonal matrix for B. In addition, 
is frequencies that occur twice result in two exact rows in B making it a singular matrix. 
Therefore, no double frequencies are allowed for the frequency computation. 

On the other hand, the preprocessing determines how many diagonals of the matrix B 
must be taken into account. This is done by counting the number of sinusoidal components 
that fall in the main lobe of each frequency response. The maximum number of components 
20 over all frequency responses yields the value for D, 

The amplitude estimation preprocessing is depicted in Figure 9. 

3 Frequency Optimization 

3.1 Introduction 

The previous sections described an efficient method to compute the amplitudes and phases 
25 for a given set of frequencies. In this part of the invention, methods are presented to get 
the initial values for the frequencies and methods to optimize these frequencies with respect 
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to the error criterium in an iterative manner. This optimization requires the computation 
of the gradient and hessian of the error function which normally require time complexity 
0(K 2 N). The invention discloses a method that reduces the computational complexity to 
0(N log(AQ). Note that still a variable window length is used. 
5 For the optimization of the frequencies two types of signal models are distinguished. One 
containing arbitrary frequency components, as given previously by 

1 K ~ 1 f n ~ - \ 

$n = w n - [Ah exp(27riu k n N n ° ) + A* k exp(-2muj k n N U ° )j + r n 

On the other hand when the sound sources are known to be pitched, a model of harmonic 
spectra can be used which is written as 

5n = Wn 2 X] p« exp(2mgu)k^j^ 1 ) + A^ q exp(~2™^ fc ^-^) j + r n (32) 

In this last case, the model consists of K sources each modelled by Q k harmonic components. 
For the first model, all frequencies must be optimized while for the second model, only the 
fundamental frequencies are optimized. As a result, a different frequency optimization is 
disclosed for each model. The amplitude estimation on the other hand, stays the same for 
is both models. 

3.2 Initial Frequency Values 

The error function is very non linear in function of the frequencies and can therefore get 
stuck in local minima. Therefore, the initial values of the frequencies plays an important 
role. Different methods can be used 

20 • (Multi)pitch estimation: For the harmonic signal model (Eq. 32), any (multi)pitch 
estimator can be used to compute a number of pitches from the original signal. From 
these pitches, sets of frequencies can be produced depending on prior knowledge of 
the sound source. For instance a harmonic series of frequencies for periodic sounds, 
a inharmonic frequency series for strings etc. Also by matching the spectrum to a 

25 database of spectra that are labelled with a series or frequencies the initial values can 

be obtained. 

• Peak picking- For the non harmonic model, individual local maxima can be detected 
in the sampled spectrum obtained by an FFT. Note that in that case the analysis 
window must be sufficiently large. 
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• Prior knowledge. The frequencies and / or pitches can also be read from an external 
file such as a musical score or an annotation of the signal. 

This is depicted in Figures 10. 

3-3 Local Quadratic Approximation 

The problem of minimizing continuous differential functions of many variables is widely- 
studied and any of the conventional methods can be applied. Many of these methods make 
a local quadratic approximation of the error function. Let us consider a second order Taylor 
expansion around the error function %, only this time, the amplitudes are kept fixed and the 
variation in function of the frequencies to is studied. This is denoted x(cD; A). The expansion 
of x(^; A) around a given value v results in a local quadratic approximation denoted as 

Vx(a>;A) ~ h| 0 + H| € (a>-0) (33) 
where h| 0 is the gradient and the Hessian matrix of x evaluated at v. 

rr | - QX&A) 
duJiOLUk 

When v is a minimum of the error function, the gradient at v is zero so that Eq. (33) becomes 

Vx(p\A) ^ H| o (O-0) . (34) 

Note that for the local approximation, the gradient is linear in function of u) and the Hessian 
independent of Q from which follows that H| 0 is equivalent with H|©. For a given <D the 
minimum v can now be approximated using 

0 Q-Ul^Vxi^A) (35) 

Two major classes of optimization algorithms can be distinguished. The first class are the 
gradient descent methods which use first order information of the error function in order to 
optimize the frequency values 

s (r) = s (r-l) _ ^r-i ( 36 ) 

20 



PC I/BE 0 3 / 0 0 2 S7 



The superscript W denotes the index of the iteration and rj the learning rate. Many different 
variations have been proposed such as using a momentum term, line search algorithms, using 
conjugate gradients and scaled conjugate gradients. 

Prom Eq. 34 follows that the error function can be optimized using 

DU=oW-H\^h\ a ^> (37) 

This is called Newton's method. Because of the large computational cost for the inversion of 
the Hessian of the matrix, quasi-Newton methods have been developed which construct the 
inverse matrix iteratively. In the present invention however, it is shown that the hessian is a 
band diagonal matrix of which the inverse can be computed in linear time. 

Finally, following termination criteria can be used to stop the iterative optimization of 
the frequencies. 

• stop after a fixed number of iterations 

• stop after fixed computation time 

• stop when error function drops below a specified value 

• stop when the error change drops below a specified value 

• stop when error measure starts to increase. 

3.4 Efficient Computation of the Gradient 

An efficient method is disclosed which allows to compute the gradient of the error func- 
tion with respect to the frequencies in 0(Nlog(N)) time. Each element of the gradient is 
computed in constant time, implying 0(K) for all elements. However, the FFT of the noise 
residual R m is required implying 0(Nlog(N)). The gradient of the error function for the 
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non harmonic model results in 
<9 X (u); A) 



dtoi 



N-l 



n-i r 

- E 2 

n=0 
N-l 

= E 



- u; n » exp(2muj k n n ° ) + A* k exp{-2muj k U ^ )\ 



/c=0 



" w n % ( ^ exp(27riu k ^—~) + A* k exp{-2mu) k b — — ) 



n — no, 



/rt . rc — n 0 . .n — n Q 1 _ , n . n — n 0 f .n — n Q \\ 
At exp(2muji—jj—)2m— w n -Af exp(-2 W — ^— ) (^-27r2— — 1 J 



JV-1 



2^ ~n R ™ exp(27rcro — ) 

ri—Q \_m—Q 

~^ n ^z exp(27m^— ^— )2ttz — + w n A* exp(-2muji — — )2ttz— — ) 1 



iV-1 



m=0 
N-l 



N-l 



Ai ]P ^n2?rz n n ° exp(2?ri(m + " Q ) 



n — no , 



n=0 



+ A* w n^i - N U ° exp(2m(rn - ^'* u ) 
= ^^R m (A* l W'{m-u }l )-A l W\m + uj l )) 



N 



N 



n — no. 



m=0 



771=0 



(38) 



where m m i n and m max are dependent of lu l and are computed from the bandwidth (3 of W'{m) 
as in Eq. (10). 
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For the harmonic model we obtain 

d x (z>; A) 



dujt 

— T 



n=0 



N-l 



= E 2 



n—0 



fc=0 g=l ^ 



1 . n .n-n 0 , n - n 0 



g=l 



- ^4 X ^27ri^^exp(-27ri^ 



2 ^ 
iv-i nv-i 1 



. n — n 0 x 
mm — — — ) 



N 
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(39) 



Hence, W'(m) is the first fourier derivative of W(m) which still results in a bandlimited 
frequency response as is shown in Fig.l. 
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3.5 Efficient Computation of the Hessain Matrix 

In this section it is derived that the Hessian matrix for the non harmonic signal model is 
band diagonal. An efficient method is disclosed which computes each element of the Hessian 
in constant time. The computation of all band diagonal elements is computed in O(K) 
time. Since the fourier transform of the noise residual R m is required the total complexity is 
0(Nlog(N)) 

The Hessian for the non harmonic model is computed 
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and continuing the computation for each term separately, the first term yields 
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For the harmonic spectrum model one achieves 



(40) 



where 5 ip denotes the Kronecker symbol and the m min and m maa; values are dependent on 
qoj p . W"(m) can again be computed by one of the frequency response calculation methods 
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that were previously described. Note that, 
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with 



Y"(m) = f^^^^xp^m^) 

which is shown in Fig. 2. Here, a same conclusion can be drawn as for the matrix B which 
was used for the computation of the amplitudes. The first term of the hessian computation 
resulted only in nonzero values for the diagonal elements of the Hessian. The second term 
also produces significant values for the main diagonal elements since the frequency differences 
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fall in the main lobe of Y n (m). Therefore, the Hessian is a band diagonal matrix, implying 
that only the elements close to the diagonal must be computed. In addition, the inverse 
matrix of the Hessian can be computed in linear time allowing to apply Newton's method. 
For the harmonic model one achieves 
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Only components that have close frequencies must be evaluated due to the band limited 
nature of Y". For instance for a given value q, and a given frequency response bandwidth /3, 
only the r values must be considered for which tloI falls in the main lobe. Since 

N 

0 < qto p << — 
N 

0 < ru>i << — 



the input values of Y" are bounded by 

JV ^ N 

-y < qu P - r^i << — 

0 < qu p + ruJi << N 

meaning that the main lobe of Y(qu p — rtx>i) ranges from -§/3 to §(3. For Y(qu p + rw,) the 
main lobe is divided over the left and right side of the spectrum due to spectral replication 
yielding the intervals [0, jfep] and [N - N]. This implies that for Y(quj p - ru>i) only the 
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r values must be considered for which 
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For the harmonic model the hessian is not band diagonal but evidently, its size very small 
compared to the hessian for the non harmonic case. In addition, each element can be com- 
puted in linear time. 



20 3.6 Efficient Frequency Optimisation 
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For the optimisation of the frequencies we distinguish a harmonic and a non-harmonic 
signal model. For the non-harmonic model, the Hessian matrix was shown to be band diagonal 
and invertible in linear time. Therefore only a shifted matrix of the Hessian denoted H needs 
to be computed. The frequencies are updated using 

u (r+i) = ^(r+l) — H" 1 <g> h (44) 

where ® denotes the multiplication of a shifted matrix with a vector and ( r ) the index of the 
iteration. The outline of the computation is depicted in Figure 11. 

For the harmonic model, the Hessian is not band diagonal but its very limited in size 
since normally only a few sources are available in the signal. The fundamental frequencies 
are optimized using 

The outline of the computation is depicted in Figure 12 and 13. 
4 The complete method 

In figure 14, the complete analysis / synthesis method is depicted. This method can be 
applied for a harmonic or non-harmonic signal model. 

First, the initial frequencies are computed (Figure 10). In the case of the harmonic model, 
a (multi)pitch estimator is used which determines an initial set of pitches from which a series 
of frequencies is computed for each source. For the non-harmonic model, peak picking can 
be used on the spectrum of the signal. 

The preprocessing routine sorts the frequencies, removes frequencies that occur multiple 
times and determines how many diagonal bands D must be considered for the amplitude 
computation (Figure 9). Then, the amplitudes are computed (Figure 8). This is realized by 
computing the band diagonal elements of matrix B and storing them in a shifted form B . 
The matrix G is computed next and by solving the band limited system, the amplitudes are 
obtained. 

The IFFT syntesizer computes the spectrum X m from the amplitudes A and frequencies 
Q. The difference with the original spectrum X m results in the residual spectrum i? m which 
is used for the frequency optimization in the next step. 

In the case of the non-harmonic model, the Hessian matrix H is band limited and is stored 
in a shifted form H. A second result of this property is that its inverse can be computed in 
linear time. The hessian and gradient are used to optimize the frequency values (Figure 11). 
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In the harmonic case, the fundamental frequencies of the different sources is optimized. 
Here, the Hessian is not band diagonal but it is very small since typically just a few sources 
are considered (Figure 12 and 13). 

The iterative loop is continued until a stopping criterhim is met. The results of the analysis 
are; the synthesized signal x n , the noise residual r n , the amplitudes A and frequencies CD. Note 
that the amplitudes are complex and therefore contain the phases. 

5 Applications 

The optimization of the computational efficiency of this highly accurate method signifi- 
cantly improves a large number of applications such as; multi-pitch extraction, parametric 
audio coding, source separation, audio classification, audio effects, automated transcription 
and annotation. These applications are depicted in Figure 15 

5.1 High Resolution (Multi)Pitch Estimation 

The efficient analysis method will improve pitch estimation techniques. For the current 
invention, the term "pitch" is used to denote the period of a harmonic source, and not the 
perceived height by the human auditory system. For instance, current (multi)pitch estimators 
based on autocorrelation such as the summary autocorrelation function (SACF) and the 
enhanced summary autocorrelation function (ESACF) allow to estimate multiple pitches. 
However, none of this methods takes into account the overlapping peaks that might occur. 
The frequency optimization for harmonic sources which is presented in this invention allows 
to improve the fundamental frequencies iteratively leading to very accurate pitch estimations. 
In addition, very small analysis windows can be used which enable to track fast variations in 
the pitch in an accurate manner. 

5.2 Parametric Audio Coding 

The resynthesis of the sound is of a very high quality indistinguishable from the original 
sound. In addition, the amplitudes and frequency parameters vary very slowly over time. 
Therefore, it is interesting to apply our method in the context of parametric coders where 
these parameters are stored in a differential manner what results in a considerable compres- 
sion. Evidently this is interesting for the storage, transmission and broadcasting of digital 
audio. 
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5.3 Source Separation 

When a multipitch estimator provides good initial values of the pitches the method op- 
timizes all parameters so that an accurate match is obtained. By synthesizing each pitch 
component to a differen signal, the different sound sources in the polyphonic recording can 
be be separated. 
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5.4 Automated Annotation and Transcription 

Fast variations in the amplitude A and frequencies Co indicate the beginning and end of a 
note. Therefore the method will contribute to the automatic annotation and/or transcription 
of the audio signal. 

5.5 Audio Effects 

By modifying the frequencies and amplitudes of the different sinusoidal components high 
quality audio effects can be achieved. The power of this method lies in the fact that fre- 
quencies and amplitudes can be manipulated independently. This allows for instance time- 
stretching, sound morphing, pitch changes, timbre manipulation etc, all with a very high 
quality. 
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DETAILED DESCRIPTION OF THE FIGURES 

Figure 1 illustrates the band limited nature of the frequency response of the Blackmann- 
Harris window, denoted W(m). Also the first and second derivative, denoted W f (m) and 
W"(m) respectively, are band diagonal. 

Figure 2 illustrates frequency response of the zero padded Blackmann-Harris window 
J>7$(m), the squared Blackmann-Harris window Y{m) and its second derivative Y"(m). Also 
these frequency responses are band limited. 

Fig. 3 illustrates the theoretic motivation for a scaled table look-up. A time domain 
window w N (n) 1 is taken which is handlimited in the frequency domain within a range 
[-/?,/?] 4. When this window is zero padded up to a length P 2 this results in a scaling 
in the frequency domain 5. Then, the spectrum is truncated in order to obtain a length N 
again 6. When taking the inverse fourier transform of this truncated spectrum, a window 
with length M zero padded up to a length N is obtained 3. 

Figure 4 depicts a method, of computing the frequency response of a sampled zero padded 
window with a variable length, according to the invention, by a scaled look-up table. The 
look-up table 10 is pre-generated 7, 9, from a non-zero-padded analysis window w n 8. This 
results in a table 10 containing an oversampled frequency response W N {m). The frequency 
response of the zero-padded window with length M smaller than N, is generated in step 11 
by doing a table look-up with the value m~ returning the desired value 12. 

Figure 5 depicts a method of computing the frequency response of a sampled zero padded 
window with a variable length, according to the invention, using an analytic frequency re- 
sponse calculator 13 as described in Eq. (14). For a given value m the calculator returns the 
frequency response of the zero padded window 14. 

Figure 6 depicts a method of computing the frequency response of a sampled zero padded 
window with a variable length, according to the invention, using an interpolation calculator. 
The frequency response of the zero padded sampled window is calculated using an inter- 
polation method of the sampled frequency response 16 obtained by a FFT 15 of a zeros 
padded 20 window 19. The interpolator calculator 18 computes from the sampled frequency 
response 16 the variable length frequency response 17 using Eq. (16). 

Figure 7 depicts the detail of a method of synthesizing a sound signal with variable length 
according to the invention. For each sinusoidal component k 21, the range of values for m is 
determined which fall in the main lobe of the frequency response 22. For each sample 23 the 
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value of the frequency response with its center at u k is computed 24 and multiplied with an 

amplitude A k 25 (Eq. 6). When this is repeated for each component k, the inverse fourier 

transform is taken 26 and the zero padding and imaginary part are removed 27. This results 

in the synthesize signal x n 28. A summary of this routine is denoted by 29. The synthesis 

has a complexity 0(Nlog(N)) since it contains an inverse FFT. 

Figure 8 depicts the detail of a method of computing the amplitudes of the sinusdoidal 

components in a sound signal in (D(N log TV) time, according to the invention. The amplitudes 

A are computed 39 from a spectrum X m for a given set of frequencies. This is realized by 

constructing the matrices C 1 , C 2 34 and the matrices B 1 ' 1 , B 2 ' 2 37 according to Eq. (25) 

and Eq. (30). By solving the set of equations represented by these matrices the amplitudes 

are computed 38. The matrices C 1 and C 2 are computed by determining for all partials I 

30 the range of m values 31, 32 of the main lobe and computing the value of each element 

according to Eq. (30) 34. The frequency response W£f f is determined from the frequency 

response calculater of W$(m), 33. For the matrices B 1 ' 1 and B 2,2 , only shifted matrices 

B 1 - 1 and B 2 ' 2 are computed containing only its band diagonal elements. The width of the 

< < 

band is denoted D, For all k values from 0 to 2D 35 each row of the matrices B 1 * 1 and B 2 ' 2 
is computed 37 according to Eq. (25). The computation of these elements requires a variable 
lengths frequency response calculator Y$(m) } 36. The equations denoted in Eq. (22) can 
now be solved directly on the shifted versions of B 1 ' 1 , B 2 ' 2 , 38 yielding the amplitude values 
39. A summary of the computation is denoted by 50 

Figure 9 depicts the pre-processing that is needed before the computation of the ampli- 
tudes according to the invention. First the frequencies are sorted 40 in order to guarantee 
a band diagonal form of B. Each frequency 41, is compared with the next frequency, and 
when this difference is smaller than a value e 42, the component is removed 43. This is done 
to avoid the singularity of B. Then, the frequencies above and below cj l are considered 45 
and the counter I 44, 46 counts the number of components that fall in the main lobe with 
width Pjfe. The maximum of this number D 47, 48 over all components I yields the number 
of diagonals that must be taken in account. A short notation of the pre-processing is denoted 
by 51. 

Figure 10 depicts several methods for the computation of the initial frequencies 56, 64, 
62, according to the invention. A first method, takes the FFT 53 of the windowed signal x n 
66 which is zero padded 52 up to a power 2 length. The local maxima are detected 55 from 
the spectrum X m 54 providing the initial frequency values. This method is well suited for the 
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non-harmonic signal model. A second method is more suited for the harmonic signal model. 
First a (multi)pitch estimator 58 is used such as the enhanced summary autocorrelation 
function (ESACF) which computes a number of pitches 59 from the windowed signal 57. 
For each detected pitch, a series of multiples of the pitch is produced 60 up to the Nyquist 
frequency. The series of frequencies of each source is denoted as Qq , tDi ... . These series together 
provide the initial values for the frequencies 62. Finally, an externally stored annotation or 
score 63 can be used which contains the pitches in the sound 65 or the individual frequency 
components. 

Figure 11 , depicts the frequency optimization for the non-harmonic model according to 
the embodiment of the invention. First the gradient and Hessian matrix are computed. Each 
element of the gradient hi, 69, is computed with respect to Eq. (38) 72 over a range 70 of m 
values 71. In addition, a frequency response calculator is used for Wjff (m) 74. Over the same 
range, the diagonal elements of the Hessian are computed according to Eq. (40) 72, using a 
frequency response calculator 74 for Wff . Then the band diagonal elements of the Hessian 
are computed according to Eq. (42) 75 76 using a frequency response calculator for Yj$* 77. 
The frequencies 79 are optimized by computing the inverse of the shifted Hessian matrix H 
and multiplying this with the gradient h, according to Eq. (44). Also short notation for the 
frequency optimization is given 138. 

Figures 12 and 13 depict the frequency optimization for the harmonic model according 
to the embodiment of the invention. First, the gradient and Hessian matrix are computed. 
For each fundamental frequency uji , 80, the gradient value hi is computed over all partials 
q 81. For each element of the gradient h h the range 82 of m values is determined 83 that 
lie in the main lobe of Wj$. The gradient is computed according to Eq. (39), This requires 
a frequency response calculator for Wjff{m) 86. In the same loop, the diagonal elements of 
the Hessian are computed according to Eq. (40) 84, using a frequency response calculator 
85 for W'm*. The method continues by going to the subroutine depicted in Figure 13 for the 
computation of the non-diagonal elements of the Hessian matrix 87, 91. Here, the elements 
Hi t k are computed with respect to Eq. (43) 95, 98, 102 using a frequency response calculator 
for Y^ N (m) 99. The differen ranges of r are determined 93, 94, 96, 97, 100, 101 that are 
required for this computation. Then the subroutine returns to the main routine 103, 88. 
Finally, the optimized frequencies are obtained 99 by computing Eq. (45) 89. Also here the 
short notation applies, 138. 

Figure 14 depicts the complete Analysis/Synthesis method according to the embodiment 
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of the invention. Starting from a windowed short time signal x n 104, the initial values of the 
frequencies 107 are computed 105. These frequencies 112 are then pre-processed 108 and 
the number of diagonal bands D 109 is determined. Then the amplitudes are computed 113 
from the fourier transform 110, 111 of the signal X m , the number of diagonal bands 109 and 

s the pre-processed frequencies 112.This results in the amplitudes 114 which together with 
the frequencies are used to synthesize the signal yielding x n 121 X m 116. The difference 
109 between the synthesized spectrum X m 116 and the original spectrum X ra 111 yields 
the error spectrum i? m 117. This error spectrum 117, the frequencies 112 and amplitudes 
114 are used to optimize 106 the frequency values 106 for the next iteration. A stopping 

o criterium evaluator determines 118 whether the iteration is continued. Several criteria were 
described in section 3.3. When the criterium is met, the iteration is terminated 119. A short 
notation is depicted 122 which takes as input the signal x n and produces a sythesized signal 
x n , the noise residual r n , the amplitudes A and frequencies u>. 

Figure (15) shows several applications of the analysis method according to the embodi- 

5 ment of the invention. In 139 the application of the analysis method as a high resolution 
pitch estimator is depicted. 140 depicts the use of the invention in the context of a para- 
metric coder 124. At the sender side, the encoder converts the amplitudes A, frequencies u> 
and noise residual r n to a bitstream 125 which can be stored, broadcasted or transmitted 
126. At the receiver side, the decoder computes the amplitudes A, frequencies cj and noise 

□ residual r n back from the bitstream. The amplitudes and frequencies are then used for the 
synthesis of the deterministic part of the sound and by adding the noise, the signal is recon- 
structed. In 141 the use of the analysis method is depicted for high quality audio effects. 
An effects processor 130 produces the modified noise r*, amplitudes .4* and frequencies Q* 
131 from which the processed sound signal 132, can be constructed 132. Fig. 142 shows 

s the application of the analysis method according to the invention to the separation of sound 
sources. All frequencies Oi and amplitudes Ai from each source are grouped 134 135, and 
then synthesize separately 136 resulting in the individual sources 137. 
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CLAIMS 

What we claim is 

1. A method for modelling, i.a. analyzing and/or synthesizing, a windowed signal with a 
variable length, such as sound or speech signals, by computing the frequencies and com- 
plex amplitudes from the signal by using an optimized least squares method, whereby 
the method is executed using one or more variable length windows and has a complexity 
O(NlogN). 

2. A method according to claim 1 using a harmonic signal model according to Eq* 32 
and/or a non-harmonic signal model according to Eq. 1. 

3. A method to compute the frequency response of a window with length M zero padded 
up to a length N comprising a scaled table look-up with a value resulting in the 
values W^(m). 

4. A method to compute efficiently the frequency response of a window with length M 
zero padded up to a length N comprising the computation of the analytic frequency 
response as given by Eq. (12). 

5. A method to compute efficiently the frequency response of a window with length M 
zero padded up to a length N consists of interpolating the sampled frequency response 
as given by Eq. (16). 

6. A method according to any of claims 1 to 5 further comprising the step of: 
estimating the amplitudes using an optimized O(NlogN) least squares estimation 
technique for variable length windows. 

7. A method according to any of the previous claims 1 to 6, further comprising the step 
of: 

synthesizing a variable length sound signal by constructed its spectrum according to 
Eq. (6) and performing an inverse fourier transform whereby the synthesis has a time 
complexity of O(NlogN). 

8. A method according to any of the previous claims 1 to 7 further comprising the step 
of: 

37 



PC T/8E -0 3/ 00207 



optimizing the frequency values using Newton's method for the non-harmonic signal 
model according to Eq. (44) and/or the harmonic signal model according to Eq. (45). 

9. A method according to any of the previous claims 1 to 8 further comprising: 
a stopping criterium evaluator. 

10. a method according to any of the previous claims 1 to 9 further comprising the step of: 
determining the initial values of frequencies by peak picking from the spectrum. 

11. A method according to any of the previous claims 1 to 10 further comprising the step 
of: 

determining the initial values of the fundamental frequencies for the harmonic signal 
model using preferably a multi-pitch estimator. 

12. A method according to any of the previous claims 1 to 11 further comprising the step 
of: 

pre-processing before the amplitude estimation of claim 5, whereby this pre-processing 
comprises the steps of: sorting the frequencies, eliminating multiple occurring frequen- 
cies and determining the number of relevant diagonal bands. 

13. A method of claim 12 further comprising the steps of computing the amplitudes by 
solving the equation given in Eq. (22), using From Eq. (25) such that only the elements 
around the diagonal of B are taken into account, whereby a shifted form B is computed 
containing only D diagonal bands of B according to Eq. (28) and Eq. (25), whereby 
the computation of the Eq. (25) requires the computation of the frequency response of 
the square window denoted Y(m), a matrix C is computed using Eq. (30), requiring 
computation of the frequency response of the window denoted W(m) preferably using 
the method of claim 2,3 or 4, and solving equation solving Eq. (22) on B and C (Eq. 
(29)). 

14. A method of claim 8 for the non-harmonic signal model, further comprising the step 
of: 

optimizing the computation of the gradient h and Hessian matrix H, said gradient is 
computed from the residual spectrum R m , the amplitude A\ and the derivative of the 
frequency response of the window denoted W\ra) as given by Eq. (38), whereby Eq. 
(42) and Eq. (40) imply that only the diagonal band of the Hessian matrix contains 
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significant values, whereby the Hessian is stored in a shifted form H, the diagonal 
elements of the Hessian are computed from the residual spectrum R m) the amplitudes 
Ai and the second derivative of the frequency response of the window denoted W ,/ {m) 
.as given by Eq. (40), the second term of the Hessian given by Eq. (42) requires the 
second derivative of the frequency response of the square window, whereby all frequency 
responses are computed preferably using one of the methods in claim 2,3 or 4; and 
the diagonal Hessian is inverted in linear time and used to optimize the frequencies 
according to Eq. (44). 

15. A method according to claim 8, for the harmonic signal model, further comprising the 
step of: 

optimizing the computation of the gradient h and Hessian matrix H, whereby the 
gradient is computed from the residual spectrum i? m , the amplitudes Ai iq and the 
derivative of the frequency response of the window denoted W'(m) as given by Eq. 
(39), Eq. (43) and Eq. (41) compute the Hessian matrix H whereby all frequency 
responses are computed using preferably one of the methods in claim 2,3 or 4. 

16. Apparatus wherein the method according to any of the previous claims 1 to 15 is 
implemented. 

17. Use of a method according to any of the claims 1 to 15 or an apparatus according to 
claim 16 for accurate pitch estimation. 

18. Use of a method according to any of the claims 1 to 15 or an apparatus according to 
claim 16 for parametric audio coders, where the noise residual, amplitudes and fre- 
quencies are encoded in a bitstream whereby said bitstream is stored, broadcasted or 
transmitted at the sender side, the receiver decodes the bitstream back to the param- 
eters and synthesizes the sound. 

19. Use of a method according to any of the claims 1 to 15 or an apparatus according to 
claim 16 for audio effects where the noise r n , whereby the amplitudes A and frequencies 
CD are manipulated by an effects processor yielding r*, v?* and d>*. 

20. Use of a method according to any of the claims 1 to 15 or an apparatus according to 
claim 16 for source separation, by separating the frequencies and amplitudes from the 
same source, and synthesizing monophonic sources from these parameters separately. 
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Use of a method according to any of the claims 1 to 15 or an apparatus according to 
claim 16 for automated annotation and transcription whereby the the signal is seg- 
mented according to the values of the amplitudes and frequencies. 
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Abstract 

The present invention relates to a method for modelling, i.a. analyzing and/or 
synthesizing, a windowed signal with a variable length, such as sound or speech signals, 
by computing the frequencies and complex amplitudes from the signal by using an 
optimized least squares method, whereby the method is using one or more variable length 

windows and has a complexity O (iVlog N). 
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Figure 1: Top: Frequency Response of Blackmann-Harris W{m), Middle: First Derivative 
W'(m), Bottom: Second Derivative, W"(m) 
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Figure 2: Top: Frequency Response of zero padded Blackmann-Harris Window W^(ra), 
Middle: Frequency Response of Squared Blackmann-Harris Window Y(m), Bottom: Second 
Derivative of the Frequency Response of the Squared Blackmann-Harris Window Y ;/ (m) 
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Figure 3: Theoretic Motivation for Scaled Table Look-up 
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Figure 4: Scaled Table Look-up 



PC T/BE 0 3 / 0 0 2 0 7 



4/13 



Window Length M 
Padded Window Length N 
Input Value m 



\ 



13 



Variable Length 



— ► 


Analytic Frequency 




Response Calculator 


► 


— ► 







\ 



14 



Figure 5: Analytic Frequency Response Computation 
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Figure 6: Sampled Frequency Response Interpolation 
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Figure 7: Synthesis 
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Figure 8: Amplitude Computation 
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Figure 9: Preprocessing of the Amplitude Computation 
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Figure 10: Initial Frequency Calculation 
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Figure 11: Frequency Optimiser for Non-Harmonic Model 
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Figure 12: Frequency Optimiser for Harmonic Model 
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Figure 13: Frequency Optimiser for Harmonic Model (subroutine) 
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Figure 14: Complete Analysis/Synthesis Method 
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Figure 15: Applications; High Accuracy Pitch Estimation, Coding, Effects and Source Sep- 
aration 



